AITopics

Genre: Research Report (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Communications (0.68)

Neural Information Processing SystemsFeb-16-2026, 15:28:20 GMT

b29ab822442a1616f9bd390fddf6e425-Supplemental-Conference.pdf

artificial intelligence, consistency, machine learning, (19 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Neural Information Processing SystemsFeb-9-2026, 05:59:48 GMT

5b288823575bb29654b0953a251e933b-Paper-Conference.pdf

comput, msanet, pattern recog, (13 more...)

Country:

Asia > Singapore (0.04)
Asia > China (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Communications (0.68)

Neural Information Processing SystemsDec-25-2025, 14:26:58 GMT

MCMAE: Masked Convolution Meets Masked Autoencoders

Vision Transformers (ViT) become widely-adopted architectures for various vision tasks. Masked auto-encoding for feature pretraining and multi-scale hybrid convolution-transformer architectures can further unleash the potentials of ViT, leading to state-of-the-art performances on image classification, detection and semantic segmentation. In this paper, our MCMAE framework demonstrates that multi-scale hybrid convolution-transformer can learn more discriminative representations via the mask auto-encoding scheme. However, directly using the original masking strategy leads to the heavy computational cost and pretraining-finetuning discrepancy. To tackle the issue, we adopt the masked convolution to prevent information leakage in the convolution blocks. A simple block-wise masking strategy is proposed to ensure computational efficiency. We also propose to more directly supervise the multi-scale features of the encoder to boost multi-scale features. Based on our pretrained MCMAE models, MCMAE-Base improves ImageNet-1K finetuning accuracy by 1.4% compared with MAE-Base. On object detection, MCMAE-Base finetuned for only 25 epochs surpasses MAE-Base fined-tuned for 100 epochs by 2.9% box AP and 2.2% mask AP respectively.

masked convolution meet masked autoencoder, mcmae, name change, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

arXiv.org Artificial IntelligenceNov-26-2025

From Passive Perception to Active Memory: A Weakly Supervised Image Manipulation Localization Framework Driven by Coarse-Grained Annotations

Guo, Zhiqing, Xi, Dongdong, Li, Songlin, Yang, Gaobo

Image manipulation localization (IML) faces a fundamental trade-off between minimizing annotation cost and achieving fine-grained localization accuracy. Existing fully-supervised IML methods depend heavily on dense pixel-level mask annotations, which limits scalability to large datasets or real-world deployment. In contrast, the majority of existing weakly-supervised IML approaches are based on image-level labels, which greatly reduce annotation effort but typically lack precise spatial localization. To address this dilemma, we propose BoxPromptIML, a novel weakly-supervised IML framework that effectively balances annotation cost and localization performance. Specifically, we propose a coarse region annotation strategy, which can generate relatively accurate manipulation masks at lower cost. To improve model efficiency and facilitate deployment, we further design an efficient lightweight student model, which learns to perform fine-grained localization through knowledge distillation from a fixed teacher model based on the Segment Anything Model (SAM). Moreover, inspired by the human subconscious memory mechanism, our feature fusion module employs a dual-guidance strategy that actively contextualizes recalled prototypical patterns with real-time observational cues derived from the input. Instead of passive feature extraction, this strategy enables a dynamic process of knowledge recollection, where long-term memory is adapted to the specific context of the current image, significantly enhancing localization accuracy and robustness. Extensive experiments across both in-distribution and out-of-distribution datasets show that Box-PromptIML outperforms or rivals fully-supervised models, while maintaining strong generalization, low annotation cost, and efficient deployment characteristics.

artificial intelligence, localization, machine learning, (17 more...)

2511.20359

Genre: Research Report > New Finding (0.46)

Industry:

Media (0.68)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceNov-3-2025

HiF-DTA: Hierarchical Feature Learning Network for Drug-Target Affinity Prediction

Li, Minghui, Wang, Yuanhang, Guo, Peijin, Wan, Wei, Hu, Shengshan, Hu, Shengqing

Abstract--Accurate prediction of Drug-T arget Affinity (DT A) is crucial for reducing experimental costs and accelerating early screening in computational drug discovery. While sequence-based deep learning methods avoid reliance on costly 3D structures, they still overlook simultaneous modeling of global sequence semantic features and local topological structural features within drugs and proteins, and represent drugs as flat sequences without atomic-level, substructural-level, and molecular-level multi-scale features. We propose HiF-DT A, a hierarchical network that adopts a dual-pathway strategy to extract both global sequence semantic and local topological features from drug and protein sequences, and models drugs multi-scale to learn atomic, substructural, and molecular representations fused via a multi-scale bilinear attention module. Experiments on Davis, KIBA, and Metz datasets show HiF-DT A outperforms state-of-the-art baselines, with ablations confirming the importance of global-local extraction and multi-scale fusion. Accurate prediction of drug-target affinity (DT A) is essential for drug screening, immune modulation and precision medicine.

artificial intelligence, machine learning, prediction, (15 more...)

2510.27281

Country: Asia > China (0.16)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-15-2025

MEASURE: Multi-scale Minimal Sufficient Representation Learning for Domain Generalization in Sleep Staging

Jo, Sangmin, Yoon, Jee Seok, Jeong, Wootaek, Oh, Kwanseok, Suk, Heung-Il

Abstract--Deep learning-based automatic sleep staging has significantly advanced in performance and plays a crucial role in the diagnosis of sleep disorders. However, those models often struggle to generalize on unseen subjects due to variability in physiological signals, resulting in degraded performance in out-of-distribution scenarios. T o address this issue, domain generalization approaches have recently been studied to ensure generalized performance on unseen domains during training. Among those techniques, contrastive learning has proven its validity in learning domain-invariant features by aligning samples of the same class across different domains. Despite its potential, many existing methods are insufficient to extract adequately domain-invariant representations, as they do not explicitly address domain characteristics embedded within the unshared information across samples. In this paper, we posit that mitigating such domain-relevant attributes--referred to as excess domain-relevant information--is key to bridging the domain gap. However, the direct strategy to mitigate the domain-relevant attributes often overfits features at the high-level information, limiting their ability to leverage the diverse temporal and spectral information encoded in the multiple feature levels. T o address these limitations, we propose a novel MEASURE (Multi-scalE minimAl SUfficient Representation lEarning) framework, which effectively reduces domain-relevant information while preserving essential temporal and spectral features for sleep stage classification. In our exhaustive experiments on publicly available sleep staging benchmark datasets, SleepEDF-20 and MASS, our proposed method consistently outperformed state-of-the-art methods.

artificial intelligence, information, machine learning, (18 more...)

2510.1207

Country: North America (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Sleep (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.88)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-24-2025

AdaMixT: Adaptive Weighted Mixture of Multi-Scale Expert Transformers for Time Series Forecasting

Zhang, Huanyao, Lin, Jiaye, Zhang, Wentao, Yuan, Haitao, Li, Guoliang

Multivariate time series forecasting involves predicting future values based on historical observations. However, existing approaches primarily rely on predefined single-scale patches or lack effective mechanisms for multi-scale feature fusion. These limitations hinder them from fully capturing the complex patterns inherent in time series, leading to constrained performance and insufficient generalizability. To address these challenges, we propose a novel architecture named Adaptive Weighted Mixture of Multi-Scale Expert Transformers (AdaMixT). Specifically, AdaMixT introduces various patches and leverages both General Pre-trained Models (GPM) and Domain-specific Models (DSM) for multi-scale feature extraction. To accommodate the heterogeneity of temporal features, AdaMixT incorporates a gating network that dynamically allocates weights among different experts, enabling more accurate predictions through adaptive multi-scale fusion. Comprehensive experiments on eight widely used benchmarks, including Weather, Traffic, Electricity, ILI, and four ETT datasets, consistently demonstrate the effectiveness of AdaMixT in real-world scenarios.

forecasting, large language model, machine learning, (19 more...)

2509.18107

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-4-2025

MISCGrasp: Leveraging Multiple Integrated Scales and Contrastive Learning for Enhanced Volumetric Grasping

Fan, Qingyu, Cai, Yinghao, Li, Chao, Jiao, Chunting, Zheng, Xudong, Lu, Tao, Liang, Bin, Wang, Shuo

Robotic grasping faces challenges in adapting to objects with varying shapes and sizes. In this paper, we introduce MISCGrasp, a volumetric grasping method that integrates multi-scale feature extraction with contrastive feature enhancement for self-adaptive grasping. We propose a query-based interaction between high-level and low-level features through the Insight Transformer, while the Empower Transformer selectively attends to the highest-level features, which synergistically strikes a balance between focusing on fine geometric details and overall geometric structures. Furthermore, MISCGrasp utilizes multi-scale contrastive learning to exploit similarities among positive grasp samples, ensuring consistency across multi-scale features. Extensive experiments in both simulated and real-world environments demonstrate that MISCGrasp outperforms baseline and variant methods in tabletop decluttering tasks. More details are available at https://miscgrasp.github.io/.

artificial intelligence, experiment, machine learning, (16 more...)

2507.02672

Country: Asia > China (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

arXiv.org Artificial IntelligenceApr-22-2025

An Efficient Aerial Image Detection with Variable Receptive Fields

Wenbin, Liu

Aerial object detection using unmanned aerial vehicles (UAVs) faces critical challenges including sub-10px targets, dense occlusions, and stringent computational constraints. Existing detectors struggle to balance accuracy and efficiency due to rigid receptive fields and redundant architectures. To address these limitations, we propose Variable Receptive Field DETR (VRF-DETR), a transformer-based detector incorporating three key components: 1) Multi-Scale Context Fusion (MSCF) module that dynamically recalibrates features through adaptive spatial attention and gated multi-scale fusion, 2) Gated Convolution (GConv) layer enabling parameter-efficient local-context modeling via depthwise separable operations and dynamic gating, and 3) Gated Multi-scale Fusion (GMCF) Bottleneck that hierarchically disentangles occluded objects through cascaded global-local interactions. Experiments on VisDrone2019 demonstrate VRF-DETR achieves 51.4\% mAP\textsubscript{50} and 31.8\% mAP\textsubscript{50:95} with only 13.5M parameters. This work establishes a new efficiency-accuracy Pareto frontier for UAV-based detection tasks.

artificial intelligence, detection, machine learning, (19 more...)

2504.15165

Country:

Asia > China (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Robotics & Automation (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.35)